Reliability estimation of a statistical classifier
نویسندگان
چکیده
Pattern classification techniques derived from statistical principles have been widely studied and have proven powerful in addressing practical classification problems. In real-world applications, the challenge is often to cope with unseen patterns i.e., patterns which are very different from those examined during the training phase. The issue with unseen patterns is the lack of accuracy of the classifier output in the regions of pattern space where the density of training data is low, which could lead to a false classification output. This paper proposes a method for estimating the reliability of a classifier to cope with these situations. While existing methods for quantifying the reliability are often based on the class membership probability estimated on global approximations, the proposed method takes into account the local density of training data in the neighborhood of a test pattern. The calculations are further simplified by using the Gaussian mixture model (GMM) to calculate the local density of the training data. The reliability of a classifier output is defined in terms of a confidence interval on the class membership probability. The lower bound of a confidence interval or the local density of training data may be used to detect the unseen patterns. The effectiveness of the proposed method is demonstrated using real data sets and performance is compared with other reliability estimation methods.
منابع مشابه
Statistical Analysis of Common Cause Failure Data to Support Safety and Reliability Analysis of Nuclear Plant Systems
This report describes the findings of a project “Statistical Analysis of Common Cause Failure Data to Support Safety and Reliability Analysis of Nuclear Plant Systems for the CNSC” under the contract No. 87055-12-0221. Analysis of Common Cause Failures (CCF) is an important element of the Probabilistic Safety Assessment (PSA) of systems important to safety in a nuclear power plant. Based on the...
متن کاملDesign of efficient classifier integration and performance evaluation in machine learning
Characteristics of any classifier heavily depend upon the nature of data set taken for training and verification. Area of app lications like health care suffered from having the large and suitable dataset. Classifier designed for health care should show a better generalization and robustness characteristics so that end results presented by classifier can consider with high reliability and confi...
متن کاملReliability yields Information Gain
In this paper we prove that the reliability of the classifications of individual instances, provided by a classifier, results in information gain with respect to the accuracy of the classifier. We illustrate this result using our new approach to classification reliability called version space support vector machines.
متن کاملA New Five-Parameter Distribution: Properties and Applications
In this paper, a new five-parameter lifetime and reliability distribution named “the exponentiated Uniform-Pareto distribution (EU-PD),” has been suggested that it has a bathtub-shaped and inverse bathtub-shape for modeling lifetime data. This distribution has applications in economics, actuarial modelling, reliability modeling, lifetime and biological sciences. Firstly, the mathematical and st...
متن کاملModel Selection and Assessment for Classification Using Validation
We address the problem of determination of the size of the test set which can can guarantee statistically significant results in classifier error estimation and in selection of the best classifier from a given set. We focus on the case of the 0-1 valued loss function and we provide one and two sides optimal bounds for Validation (known also as Hold-Out Estimate and Train-and-Test Method). We al...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition Letters
دوره 29 شماره
صفحات -
تاریخ انتشار 2008